Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Domain ontology driven approach for bidding webpage parsing
MA Dongxue, SONG She, XIE Zhenping, LIU Yuan
Journal of Computer Applications    2020, 40 (6): 1574-1579.   DOI: 10.11772/j.issn.1001-9081.2019101792
Abstract422)      PDF (3054KB)(517)       Save

In order to solve the low efficiency problem of parsing bidding webpages when using regular expression, a new automatic method was proposed based on bidding ontology model. Firstly, the structural features of bidding webpage texts were analyzed. Furthermore, a lightweight domain knowledge model on bidding ontology was constructed. Finally, a new algorithm for semantic matching and extraction of bidding webpage elements was introduced to realize the automatic parsing of bidding webpages. The experimental results show that, the accuracy and recall of the new method can reach 95.33% and 88.29% respectively by adaptive parsing. Compared with the regular expression method, the performance can be improved by 3.98 percentage points and 3.81 percentage points respectively. The proposed method can adaptively realize the structured parsing and extraction of semantic information in bidding webpages, and can satisfy the requirements of practical applications.

Reference | Related Articles | Metrics
Efficient block-based sampling algorithm for aggregation query processing on duplicate charged records
PAN Mingyu, ZHANG Lu, LONG Guobiao, LI Xianglong, MA Dongxue, XU Liang
Journal of Computer Applications    2018, 38 (6): 1596-1600.   DOI: 10.11772/j.issn.1001-9081.2017112632
Abstract377)      PDF (982KB)(310)       Save
The existing query analysis methods usually treat the entity resolution as an offline preprocessing process to clean the whole data set. However, with the continuous increasing of data size, such offline cleaning mode with high computing complexity has been difficult to meet the needs of real-time analysis in most applications. In order to solve the problem of aggregation query on duplicate charged records, a new method integrating entity resolution with approximate aggregation query processing was proposed. Firstly, a block-based sampling strategy was adopted to collect samples. Then, an entity recognition method was used to identify the duplicate entities on the sampled samples. Finally, the unbiased estimation of aggregated results was reconstructed according to the results of entity recognition. The proposed method avoids the time cost of identifying all entities, and returns the query results that satisfy user needs by identifying only a small number of sample data. The experimental results on both real dataset and synthetic dataset demonstrate the efficiency and reliability of the proposed method.
Reference | Related Articles | Metrics